NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CyBERT: Contextualized Embeddings for the Cybersecurity Domain

https://doi.org/10.1109/BigData52589.2021.9671824

Ranade, Priyanka; Piplai, Aritran; Joshi, Anupam; Finin, Tim (December 2021, IEEE International Conference on Big Data)

We present CyBERT, a domain-specific Bidirectional Encoder Representations from Transformers (BERT) model, fine-tuned with a large corpus of textual cybersecurity data. State-of-the-art natural language models that can process dense, fine-grained textual threat, attack, and vulnerability information can provide numerous benefits to the cybersecurity community. The primary contribution of this paper is providing the security community with an initial fine-tuned BERT model that can perform a variety of cybersecurity-specific downstream tasks with high accuracy and efficient use of resources. We create a cybersecurity corpus from open-source unstructured and semi-unstructured Cyber Threat Intelligence (CTI) data and use it to fine-tune a base BERT model with Masked Language Modeling (MLM) to recognize specialized cybersecurity entities. We evaluate the model using various downstream tasks that can benefit modern Security Operations Centers (SOCs). The finetuned CyBERT model outperforms the base BERT model in the domain-specific MLM evaluation. We also provide use-cases of CyBERT application in cybersecurity based downstream tasks.
more » « less
Full Text Available
Recognizing and Extracting Cybersecurity Entities from Text

Hanks, Casey; Maiden, Michael; Ranade, Priyanka; Finin, Tim; Joshi, Anupam (January 2022, Workshop on Machine Learning for Cybersecurity, International Conference on Machine Learning)

Cyber Threat Intelligence (CTI) is information describing threat vectors, vulnerabilities, and attacks and is often used as training data for AI-based cyber defense systems such as Cybersecurity Knowledge Graphs (CKG). There is a strong need to develop community-accessible datasets to train existing AI-based cybersecurity pipelines to efficiently and accurately extract meaningful insights from CTI. We have created an initial unstructured CTI corpus from a variety of open sources that we are using to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities. We also describe methods to apply cybersecurity domain entity linking with existing world knowledge from Wikidata. Our future work will survey and test spaCy NLP tools, and create methods for continuous integration of new information extracted from text.
more » « less
Full Text Available
Generating Fake Cyber Threat Intelligence Using Transformer-Based Models

https://doi.org/10.1109/IJCNN52387.2021.9534192

Ranade, Priyanka; Piplai, Aritran; Mittal, Sudip; Joshi, Anupam; Finin, Tim (July 2021, 2021 International Joint Conference on Neural Networks (IJCNN))

Full Text Available
Cybersecurity Threat Intelligence Augmentation and Embedding Improvement - A Healthcare Usecase

https://doi.org/10.1109/ISI49825.2020.9280482

Sills, Matthew; Ranade, Priyanka; Mittal, Sudip (November 2020, 2020 IEEE International Conference on Intelligence and Security Informatics (ISI))
null (Ed.)
The implementation of Internet of Things (IoT) devices in medical environments, has introduced a growing list of security vulnerabilities and threats. The lack of an extensible big data resource that captures medical device vulnerabilities limits the use of Artificial Intelligence (AI) based cyber defense systems in capturing, detecting, and preventing known and future attacks. We describe a system that generates a repository of Cyber Threat Intelligence (CTI) about various medical devices and their known vulnerabilities from sources such as manufacturer and ICS-CERT vulnerability alerts. We augment the intelligence repository with data sources such as Wikidata and public medical databases. The combined resources are integrated with threat intelligence in our Cybersecurity Knowledge Graph (CKG) from previous research. The augmented graph embeddings are useful in querying relevant information and can help in various AI assisted cybersecurity tasks. Given the integration of multiple resources, we found the augmented CKG produced higher quality graph representations. The augmented CKG produced a 31% increase in the Mean Average Precision (MAP) value, computed over an information retrieval task.
more » « less
Full Text Available
Using Knowledge Graphs and Reinforcement Learning for Malware Analysis

https://doi.org/10.1109/BigData50022.2020.9378491

Piplai, Aritran; Ranade, Priyanka; Kotal, Anantaa; Mittal, Sudip; Narayanan, Sandeep Nair; Joshi, Anupam (December 2020, 2020 IEEE International Conference on Big Data (Big Data))
null (Ed.)
Machine learning algorithms used to detect attacks are limited by the fact that they cannot incorporate the back-ground knowledge that an analyst has. This limits their suitability in detecting new attacks. Reinforcement learning is different from traditional machine learning algorithms used in the cybersecurity domain. Compared to traditional ML algorithms, reinforcement learning does not need a mapping of the input-output space or a specific user-defined metric to compare data points. This is important for the cybersecurity domain, especially for malware detection and mitigation, as not all problems have a single, known, correct answer. Often, security researchers have to resort to guided trial and error to understand the presence of a malware and mitigate it.In this paper, we incorporate prior knowledge, represented as Cybersecurity Knowledge Graphs (CKGs), to guide the exploration of an RL algorithm to detect malware. CKGs capture semantic relationships between cyber-entities, including that mined from open source. Instead of trying out random guesses and observing the change in the environment, we aim to take the help of verified knowledge about cyber-attack to guide our reinforcement learning algorithm to effectively identify ways to detect the presence of malicious filenames so that they can be deleted to mitigate a cyber-attack. We show that such a guided system outperforms a base RL system in detecting malware.
more » « less
Full Text Available
Computational Understanding of Narratives: A Survey

https://doi.org/10.1109/ACCESS.2022.3205314

Ranade, Priyanka; Dey, Sanorita; Joshi, Anupam; Finin, Tim (January 2022, IEEE Access)

Search for: All records